Automatic Indexing and Generating of Content Graphs from Unrestricted Text
نویسنده
چکیده
F or qu ite som e tim e, I have been exp lorin g the surface signals o f language, and try in g to put them to as m uch use as possib le , prim arily in m orp h ology -based p art-o f-sp eech assignm ent (K ällgren 1984a,b ,c, 1985) and pattern -based syn tac tic analysis (K ä llgren 1987). T h is kind o f large-scale, probabilistic parsing on the basis o f m orph olog ica l and syn tactic patterns has lately com e to use in several p ro je cts . S om e m odels that have been docu m en ted axe the U C R E L parser in E ngland for the B row n and L O B co rp o ra (G arside & Leech 1987), the V O L S U N G A parser in the U S A for the B row n corpu s (D eR ose 1988), K en C hu rch ’s stoch astic m odels (C h urch 1988), as well as oth er w ork in the US (B lack 1988), bu t I am sure w ork a lon g these lines is go in g on in several places. T h e im petu s beh in d the w orks ju st m entioned is m ainly a need for analyzing large am ou n ts o f unrestricted text in a w ay that is n ot t o o resource-dem anding, e ither on tim e or on com p u tin g pow er. A s a secon dary goa l, I have seen the needs o f large-sca le in form ation retrieval. K eep in g m y original surface-orientation , I have gon e further from the analysis in to parts-o f-speech and constituents and started to lo o k at the ex traction and representation o f som e kind o f ‘ con ten t’ from the surface o f texts , w ith ou t any kind o f know ledge base support. T h is m ight seem qu ite im possib le . M an y o f the o th er papers in this volum e deal w ith h ow unavailable in ferences are to the com prehension o f text, and o f course they are right. I f th e aim is t o bu ild a com pu terized system that will in any way sim ulate language understanding, it is necessary to have a large know ledge base and m echanism s for m aking inferences from it, but there are also applications w here the hum an know ledge and in ferencing capacities can be used instead. M y ap p roach in the experim en t to b e reported here has been to let each on e do
منابع مشابه
مدل دو مرحله ای شکاف- گلچین برای نمایه سازی خودکار متون فارسی
Purpose: Each language has its own problems. This leads to consider appropriate models for automatic indexing of every language. These models should concern the exhaustificity and specificity of indexing. This paper aims at introduction and evaluation of a model which is suited for Persian automatic indexing. This model suggests to break the text into the particles of candidate terms and to c...
متن کاملA survey on Automatic Text Summarization
Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...
متن کاملتأملاتی بر نمایه سازی تصاویر: یک تصویر ارزشی برابر با هزار واژه
Purpose: This paper presents various image indexing techniques and discusses their advantages and limitations. Methodology: conducting a review of the literature review, it identifies three main image indexing techniques, namely concept-based image indexing, content-based image indexing and folksonomy. It then describes each technique. Findings: Concept-based image indexing is te...
متن کاملAutomatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation
Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...
متن کاملNoun-Phrase Analysis in Unrestricted Text for Information Retrieval
Information retrieval is an important application area of natural-language processing where one encounters the genuine challenge of processing large quantities of unrestricted natural-language text. This paper reports on the application of a few simple, yet robust and efficient nounphrase analysis techniques to create better indexing phrases for information retrieval. In particular, we describe...
متن کاملA Comparing between the impacts of text based indexing and folksonomy on ranking of images search via Google search engine
Background and Aim: The purpose of this study was to compare the impact of text based indexing and folksonomy in image retrieval via Google search engine. Methods: This study used experimental method. The sample is 30 images extracted from the book “Gray anatomy”. The research was carried out in 4 stages; in the first stage, images were uploaded to an “Instagram” account so the images are tagge...
متن کامل